Wake up remote Claude Code agents on new events#9399
Conversation
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
|
I'm starting a first review of this pull request. I completed the review and posted feedback on this pull request. Comment You can view the conversation on Warp. I completed the review and posted feedback on this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds task-scoped orchestration messaging, remote-child wake-up via run followups, restored remote child pane handling, and persistence for remote-child metadata.
Concerns
- The CLI now rejects
--task-idwith--conversation, which can break existing worker follow-up invocations that still pass both before the task-backed resume path runs. - V1 restored child lifecycle behavior appears to have been removed, and the poller restore path is no longer gated on
OrchestrationV2, so disabled-v2 restores can regress. - The remote-child follow-up submission has no bounded timeout, so a stalled request can leave the wake permanently pending.
- The local Claude wake path can mark the server task
InProgressbefore verifying the conversation is still ready, leaving task state inconsistent if the final readiness check fails. - Security pass: no separate security-specific findings beyond the correctness issues above.
Verdict
Found: 0 critical, 5 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
…-new-events Co-Authored-By: Oz <oz-agent@warp.dev>
cephalonaut
left a comment
There was a problem hiding this comment.
I think the followup based remote child wake ups ought to be driven from the server side. One good reason I think is that ultimately we will support arbitrary subscription to event sources and having the wakeups driven from the event source side seems impractical. Let me know if that doesn't make sense!
Given that, I think the claude wakeup code also ought move. Most of it is harness specific — Claude transcript envelopes, session-index files, claude --resume command construction, parent-bridge state staging. You could move it to a wake_driver.rs alongside the parent bridge and expose a single wake_dormant_session method to the controller. Controller keeps the EventsReady subscription, readiness gating, ExecuteLocalHarnessCommand emission, and conversation-status update; harness owns eligibility, context fetch, on-disk staging, command construction, and reopen/rollback.
Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
Overview
This PR adds remote-child orchestration wake/messaging plumbing, task-scoped agent messaging APIs, persisted remote-child metadata, restored hidden child pane behavior, and a new Codex harness path.
Concerns
- Remote child pending message events are drained locally without submitting the advertised run follow-up, which can drop the wake signal and leave the child agent dormant.
- The new Codex auth seeding writes API keys into an existing auth file without correcting permissive permissions.
- Codex harness conversion for server GraphQL values is brittle and can still map Codex runs to Unknown.
Verdict
Found: 1 critical, 2 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
cephalonaut
left a comment
There was a problem hiding this comment.
Second-pass review after the wake refactor. Most comments are small (dead code, fragile match, possible regression, asks for in-code documentation, two extraction nits). Two larger architectural notes embedded in the comments on maybe_prepare_local_claude_wake and prepare_local_wake_command: the wake decision belongs in a harness-side supervisor, and the streamer should not subscribe to dormant Claude-harness conversations at all — which collapses most of the wake's bookkeeping.
Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
cephalonaut
left a comment
There was a problem hiding this comment.
Third pass after the wake-listener fix. The fix to the wake-trigger gap looks good. Three small things below.
| .await | ||
| } | ||
| None => ai_client.send_agent_message(request).await, | ||
| } |
There was a problem hiding this comment.
send_agent_message_with_timeout has no timeout on wasm
Not sure this is really invoked from wasm, but maybe add a comment.
| if me.is_dormant_claude_wake_listener_eligible(conversation_id, ctx) { | ||
| me.start_dormant_claude_wake_listener(conversation_id, ctx); | ||
| } | ||
| } |
There was a problem hiding this comment.
Wake-listener restart has no backoff
On Ok(None) or Err the callback immediately restarts the wake-only listener if still eligible, with no attempt counter at this layer. The inner AgentEventDriverConfig::retry_forever backs off across SSE failures, but a clean-but-empty stream close would loop tight. Add a small backoff or a bounded retry count for the outer restart.
| tx, | ||
| self_run_id, | ||
| hydrator, | ||
| hydrate_new_messages: true, |
There was a problem hiding this comment.
hydrate_new_messages — dead flag in production?
The struct field is only set false in the test sse_forwarding_consumer_skips_message_hydration_when_disabled. start_sse_connection (this line) hardcodes true. Either remove (YAGNI) or document the upcoming caller that will toggle it.
Co-Authored-By: Oz <oz-agent@warp.dev>
Description
Fixes orchestration v2 parent/child agent wake-up and messaging behavior for remote child agents.
This PR updates the client-side orchestration flow so that incoming parent-agent messages wake remote child agents through the server run follow-up path instead of trying to treat them like local dormant Claude harnesses. Previously, a remote child could receive the parent’s message, but it would not be restarted correctly in a harness and could fail or hang when trying to send a message back to the parent.
Main changes:
• Adds a remote-child wake path in the blocklist AI controller:
◦ detects remote child conversations with pending parent-agent message events
◦ submits a run follow-up to agent/runs/{run_id}/followups
◦ removes delivered pending message events after successful follow-up submission
◦ retries/logs failures instead of silently hanging
• Keeps local dormant Claude wake behavior separate from remote child wake behavior.
• Restores remote hidden child panes as cloud/ambient agent panes instead of local terminal-backed child panes.
• Ensures restored remote child panes enter the existing ambient session in AgentRunning state.
• Persists and restores remote-child conversation metadata so the client can distinguish local children from remote children across reloads.
• Improves orchestration v2 message sending:
◦ uses task-scoped server APIs when available
◦ adds bounded timeouts and error logging for send failures
◦ surfaces failures instead of leaving action execution indefinitely pending
• Adds regression coverage for:
◦ remote child conversation restoration
◦ remote child pane/session state
◦ task-scoped ambient agent messaging
◦ orchestration v2 message/error behavior
Testing
Server API dependencies
Agent Mode
Changelog Entries for Stable
CHANGELOG-NEW-FEATURE: {{text goes here...}}
CHANGELOG-IMPROVEMENT: {{text goes here...}}
CHANGELOG-BUG-FIX: {{text goes here...}}
CHANGELOG-BUG-FIX: {{more text goes here...}}
CHANGELOG-IMAGE: {{GCP-hosted URL goes here...}}
CHANGELOG-OZ: {{text goes here...}}